Skip to content

feat: add victoria logs#39

Open
reshnm wants to merge 15 commits intomainfrom
feat/victoria-logs
Open

feat: add victoria logs#39
reshnm wants to merge 15 commits intomainfrom
feat/victoria-logs

Conversation

@reshnm
Copy link
Copy Markdown
Contributor

@reshnm reshnm commented Mar 24, 2026

What this PR does / why we need it:

Adds a complete log aggregation pipeline to the observability stack. Pod stdout/stderr logs from all nodes are collected by a new OpenTelemetry Collector DaemonSet, enriched with Kubernetes metadata, and stored in Victoria Logs — a high-performance log storage backend by
VictoriaMetrics. The Victoria Logs UI is exposed through the same shared Envoy Gateway used for Prometheus, using HTTPS + mTLS client certificate authentication.

Key changes:

  • Victoria Logs: new StatefulSet, Service, and KRO ResourceGraphDefinition. Retention period, storage size, and resource requests/limits are all configurable via the ObservabilityStack CR.
  • OTel Collector DaemonSet (log-collector.yaml): reads /var/log/pods on every node, handles both Docker JSON and containerd/CRI-O log formats, extracts pod metadata (namespace, pod name, uid, container) from the file path, promotes JSON-structured log fields to indexed
    attributes, and ships logs to Victoria Logs via OTLP HTTP.
  • Shared Observability Gateway: Prometheus and Victoria Logs (UI + OTLP ingestion) are now served through a single consolidated Envoy Gateway in kustomizations/observability-gateway/, removing the previously per-component gateway setup from kustomizations/prometheus/.
  • Cross-cluster log ingestion: a dedicated OTLP mTLS endpoint (otlp-logs.<gw-ns>.<base-domain>:8443) accepts logs from external Kubernetes clusters.
  • Cluster labeling: a k8s_cluster resource attribute is injected via the resource processor, allowing logs from multiple clusters to be differentiated in Victoria Logs queries.
  • E2E test: new test/e2e/victoria-logs.go + assess step in obs_stack_test.go that port-forwards to Victoria Logs and verifies at least one log entry is ingested within 10 minutes.
  • Documentation: README substantially expanded with log querying examples, mTLS cert extraction, cross-cluster ingestion guide, and Alertmanager configuration instructions.

Which issue(s) this PR fixes:
Fixes openmcp-project/backlog#524

Special notes for your reviewer:

The Prometheus gateway resources have been deleted from kustomizations/prometheus/ and replaced by the consolidated kustomizations/observability-gateway/. The observability-gateway RGD now manages certs and gateway configuration for both Prometheus and Victoria Logs
(including the OTLP ingestion endpoint). Reviewers should pay attention to the patch ordering in resource-graph-definitions/observability-gateway.yaml to ensure hostname and cert DNS SANs are set correctly.

Release note:

Add log aggregation to the observability stack. Pod stdout/stderr logs from all cluster nodes are now automatically collected by an OpenTelemetry Collector DaemonSet and stored in Victoria Logs. The Victoria Logs UI is accessible via the shared Observability Gateway using     
HTTPS and mTLS client certificate authentication (same certificate used for Prometheus). Logs from additional Kubernetes clusters can be ingested via a dedicated mTLS OTLP endpoint. Victoria Logs retention period, storage size, and resource limits are configurable via the     
`ObservabilityStack` custom resource. 

@reshnm reshnm marked this pull request as draft March 24, 2026 10:29
@reshnm reshnm force-pushed the feat/victoria-logs branch from 664bcff to 820016d Compare March 26, 2026 12:54
@reshnm reshnm requested a review from n3rdc4ptn April 1, 2026 11:22
@reshnm reshnm marked this pull request as ready for review April 1, 2026 11:22
@reshnm reshnm force-pushed the feat/victoria-logs branch from 95eb49e to 99a25ec Compare April 2, 2026 10:45
@reshnm reshnm requested a review from ValentinGerlach April 2, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task: Add log ingestion framework to observability stack

1 participant